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ABSTRACT 

Gang violence is a major problem in the United States ac¬ 
counting for a large fraction of homicides and other vio¬ 
lent crime. In this paper, we study the problem of early 
identification of violent gang members. Our approach re¬ 
lies on modified centrality measures that take into account 
additional data of the individuals in the social network of 
co-arrestees which together with other arrest metadata pro¬ 
vide a rich set of features for a classification algorithm. We 
show our approach obtains high precision and recall (0.89 
and 0.78 respectively) in the case where the entire network 
is known and out-performs current approaches used by law- 
enforcement to the problem in the case where the network 
is discovered overtime by virtue of new arrests - mimick¬ 
ing real-world law-enforcement operations. Operational is¬ 
sues are also discussed as we are preparing to leverage this 
method in an operational environment. 

Categories and Subject Descriptors: J.4 [Computer 
Applications]: Sociology 

General Terms: Security; Experimentation 
Keywords: Social Network Analysis; Criminology 


1. INTRODUCTION 

Gang violence is a major problem in the United States [Tj 
[5] - accounting for 20 to 50 percent of homicides in many 
major cities 10]. Yet, law enforcement actually has exist¬ 
ing data on many of these groups. For example the un¬ 
derlying social network structure is often recorded by law- 
enforcement and has previously been shown useful in en¬ 
abling “smart policing” tactics |17j and improving law 
-enforcement’s understanding of a gang’s organizational struc¬ 
ture 19 . In this paper we look to leverage this gang social 


network information to create features that allows us to clas¬ 
sify individuals as potentially violent. While the results of 
such a classifier are insufficient to lead to arrests, it is able 
to provide the police leads to individuals who are likely to 
be involved in violence, allowing for a more focused policing 
with respect to patrols and intelligence gathering. Our key 
aim is to significantly reduce the population of potential vio¬ 
lent gang members which will lead to more efficient policing. 

In this paper, we introduce our method for identifying 
potentially violent gang members that leverages features de¬ 
rived from the co-arestee social network of criminal gangs in 
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a classifier to identify potentially violent individuals. We 
note that this classification problem is particularly difficult 
due to not only data imbalances, but also due to the fact that 
many violent crimes are conducted due to heightened emo¬ 
tions - and hence difficult to identify. Though we augment 
our network-based features with some additional meta-data 
from the arrest records, our approach does not leverage fea¬ 
tures concerning the race, ethnicity, or gender of individuals 
in the social network. We evaluate our method using real- 
world offender data from the Chicago Police Department. 
This paper makes the following contributions: 

• We discuss how centrality measurements such as de¬ 
gree, closeness, and betweenness when modified to ac¬ 
count for metadata about past offenses such as the 
type of offense and whether the offense was classified 
as “violent” can serve as robust features for identifying 
violent offenders. 

• We show how the network features, combined with 
other feature categories provide surprisingly robust per¬ 
formance when the entire offender is known in terms of 
both precision (0.89) and recall (0.78) using cross-fold 
validation. 

• We then test our methods in the case where the net¬ 
work is exposed over time (by virtue of new arrests) 
which mimics an operational situation. Though preci¬ 
sion and recall are reduced in this case, we show that 
our method significantly outperforms the baseline ap¬ 
proach currently in use by law-enforcement - on aver¬ 
age increasing precision and recall by more than two 
and three times respectively. 

In addition to these main results, we also present some side 
results on the structure and nature of the police dataset we 
examine. The paper is organized as follows. In Section[2]we 
motivate this difficult problem within the law-enforcement 
community. This is followed by a description of our dataset 
along with technical notation in Section [3] There, we also 
describe some interesting aspects of the gang arrest dataset 
and our co-arrestee network. In Section [I] we formally de¬ 
fine our problem, describe existing approaches, and then de¬ 
scribe the features we use in our approach. Then we present 
our results in Section [5] for both cases where we assume the 
underlying network is known and when we discover the net¬ 
work over time (mimicking an operational scenario). Finally, 
related work is discussed in Section [6] 




2. BACKGROUND 


A recent study shows that the network for gunshot vic¬ 
timization is denser than previously believed 16 . According 
to the authors, within the city of Chicago over 70% of all 
gunshot victims are contained within only 6% of the total 
population These findings validate what has been consid¬ 
ered common knowledge among police for decades: who you 
hang out with matters, and if you hang out with those who 
engage in or are victims of violence you are more likely to 
become an offender or victim yourself. 


violence allows law enforcement to focus on smaller groups 
of people and smaller geographic areas (those areas within 
which those individuals identified are known to frequent). 
Therefore, our approach can significantly improve such ef¬ 
forts to identify violent individuals. In this paper, we see 
how our method not only out-performs the current social 
network heuristic used by police, but also that it provides 
a much smaller and more precise list of potentially violent 
offenders than simply listing those with a violent criminal 
record. 


Identifying potential offenders of gun violence has also 
been a staple practice for most law enforcement agencies 
as an attempt to curtail future victimization. When gang 
conflicts get “hot,” it’s common for law enforcement agents 
to put together a list of known “shooters”: those known 
gang members with an existing criminal history for gun vi¬ 
olence and a predilection for engaging in such illegal activ¬ 
ity. Law enforcement agents then attempt to make contact 
with these individuals with the expectation that such direct 
contact might prevent violence. For most law enforcement 
agencies, however, this practice is performed in a very ad- 
hoc manner. Identifying these individuals for intervention 
has relied primarily on the ability of law enforcement agents 
to remember and identify at-risk individuals. While feasible 
for small or discreet networks, the ability to recall multiple 
individuals in large networks that cross large geographic re¬ 
gions and interact with multiple networks becomes increas¬ 
ing difficult. This difficulty increases significantly as rela¬ 
tionships between networks change, known individuals leave 
the network, and new individuals enter it. In particular, 
the practice is less than idea because it requires officers to 
attempt to recall criminal history and network association 
data that varies between network members. For example, 
a subject who has been arrested on multiple occasions for 
carrying a gun or has been arrested for shooting another in¬ 
dividual is easy to recall, but recalling and quantifying the 
risk for a subject with multiple arrests for non-gun violence 
and a direct association with several offenders and victims 
of gun violence can be much more difficult. In short, iden¬ 
tifying a known “shooter” is relatively straightforward: they 
are known. The approach in this paper synthesizes network 
connectivity other attributes of the subject to identify those 
individuals at risk that law enforcement might not yet know. 

Using this information, law enforcement agents may not 
only more reliably and consistently identify those individuals 
most likely to engage in acts of violence or become victims 
of violence due to their personal associations with it, but 
also to more effectively manage agency resources. Interven¬ 
tion strategies may include service providers outside law en¬ 
forcement, such as family members, social service providers, 
current or former educators, and clergy. This diversity in 
approach not only delivers a more powerful “stop the vio¬ 
lence” message but provides a kind of force multiplier for 
law enforcement, increasing the number of persons involved 
in the effort to prevent violence. Identifying specific individ¬ 
uals for intervention also allows for a more targeted effort by 
law enforcement in terms of personnel and geographic areas 
needing coverage. Blanketing violence reduction strategies 
that saturate geographic areas with law enforcement agents 
and rely on direct contact with large numbers of criminal 
network members are inefficient and resource consuming. 
Focusing efforts on those individuals most likely to engage in 


3. GANG CO-OFFENDER NETWORK 

In this section, we introduce the necessary basic notation 
to describe our co-offender network and then provide de¬ 
tails of our real-world criminal dataset and study some of 
its properties. 

3.1 Technical Preliminaries 

Throughout this paper we shall represent an offender net¬ 
work as an undirected graph G = (U, E) where the nodes 
correspond with previous offenders and an undirected edge 
exists between offenders if they were arrested together. We 
will use t to denote the set of timepoints (dates). We also 
have three sets of labels for the nodes: V, 5, gang which 
are the sets of violent crimes, non violent crimes, and gangs. 
For each time point t and each node v, the binary variable 
arr{ £ {true, false} denotes if v was arrested at time t and 
distrl, fceaf}, gang * to denote the district, beat, and gang af¬ 
filiation of v at time t (we will assume that time is fine-grain 
enough to ensure that at each time unit an individual is ar¬ 
rested no more than once). If we drop the t superscript for 
these three symbols, it will denote the most recent district, 
beat, and gang associated with v in the knowledgebase. We 
shall use the sets and Si to denote the set of violent and 
non violent offenses committed by v at time t respectively. 
Note if arrl = false then = 0. We will drop the super¬ 
script t for this symbol to denote the union of labels at any 
time t in the historical knowledgebase. We also note that 
the edges in the graph also depend on time, but for sake of 
readability, we shall state with words the duration of time 
considered for the edges. 

For a given violent crime c £ VUS, we will use the notation 
Vj = {v € V s.t. c £ V{} (intuitively, the subset of the 
population who have committed crime c at time t). Again, 
we will drop the superscript t if v could have committed 
crime c at any time in the historical knowledgebase. For 
a set of labels C C VU5, we will extend this notation: 
VS = {V £ V s.t. C nv{ / 0}. We will slightly abuse 
notation here: Vjf = V. We will use similar notation for 
denoting a subset of the population that are members of a 
certain gang. For instance, V} a n 9 „ refers to the set of nodes 
who are in the same gang as node v. Likewise, we shall use 
the same notation for subgraphs: G* c is the subgraph of G 
containing only nodes in Vq and their adjacent edges. We 
will use the function d : V x V —> N to denote the distance 
between two nodes - which for this paper will be the number 
of links in the shortest path. For a given node v, the set 
Nl = {v' £ V s.t. d(v,v') = i} - the set of nodes that are 
whose shortest path is exactly i hops from v. For two nodes 
v,v', we will use the notation a(v,v') to be the number of 
shortest paths between v and v'. For nodes u, v, v', a u (v, v') 
will be the number of shortest paths between v and v' that 
pass through u. 



Table 1: Summary of arrest data. 


Name 

Value 

Number of records 

64466 

Violent offense 

4450 

Homicide 

312 

Criminal sexual assault 

153 

Robbery 

1959 

Aggravated assault 

1441 

Aggravated battery 

896 

Non violent offense 

60016 


Table 2: Network properties. 


Name 

Values 

Vertices 

9373 

Edges 

17197 

Average degree 

3.66 

Average clustering 

0.5 

Transitivity 

0.62 

Connected components 

1843 

Largest connected component di¬ 
ameter 

36 

Largest connected component aver¬ 
age path length 

12.22 

Largest connected component aver¬ 
age clustering 

0.63 


For a given subgraph G' of G, we shall use C (G') to de¬ 
note the largest connected component of G' and for node 
v £ G', we will use the notation C„(G / ) to denote the con¬ 
nected component of G' to which v belongs. If we apply a 
community finding algorithm to subgraph G ', we will use 
the notation P^G') to denote the partition of G' to which 
v belongs. We will use the notation | • | to denote the size of 
a set or the number of nodes in a subgraph. 

3.2 Overview of Network Data 

In this section we describe our police dataset and the asso¬ 
ciated co-offender network as well as some interesting char¬ 
acteristics that we have noticed. 

Police Dataset. Our dataset consists of gang-related ar¬ 
rest incidents gathered from August 2011 - August 2014 in 
Chicago as well as their immediate associates. This data set 
includes locations, dates, the links between the joint arrests, 
and the gang affiliation of the offenders. In Table]!] we sum¬ 
marize some of the important characteristics of the dataset. 

Violent Crimes. In our dataset, the set V consists of the 
following crimes have been identified by the Chicago Police 
as violent crimes: homicide (first or second degree murder), 
criminal sexual assault, robbery, aggravated assault, and ag¬ 
gravated battery. All aforementioned offenses are also FBI 
“index” crimes as well. A key aspect about the violent crimes 
is that the dataset is highly imbalanced with much more ar¬ 
rests for non violent crimes vs. arrests for violent crimes 
(60016 vs. 4450). 

Network Properties. From the arrest data, we were able 
to construct the co-offender network. In this network, the 
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Figure 1: The gang co-offender network. Each color corre¬ 
sponds with a different gang. 


isolated vertices are eliminated due to the lack of structural 
information. A visualization of the network is depicted in 
Figure[l]and we have included summary statistics in Table[2] 
In studying this network, we studied its degree distribution 
(Figure[2|. Unlike the degree distribution for other scale free 
social networks, the degree distribution for the offender net¬ 
work is exponential rather than power law. However, despite 
the degree distribution being similar to that of a random (E- 
R) or small world network topology [27], we noticed other 
characteristics that indicate differently. The co-offender net¬ 
work has a much higher average clustering coefficient than in 
a random network and does not follow the properties of the 
small world topology due to the relative high diameter and 
average shortest path (computed for the largest connected 
component.) 



Degree 


Figure 2: Network degree distribution. The exponential 
function fits to the distribution (R 2 = 0.77). 

Repeat Offenders. There are many instances of repeated 
offenses from the same offender. Figure [3] shows the dis¬ 
tribution of the repeated arrests for each individual in the 
dataset. This indicates that arrest records have utility in 
































identifying future offenders. 



Number of arrests 

Figure 3: Repeated arrests. 12866 instances of one-time 
arrests have been removed. 

Seasonality of Crime. There is also a higher chance of 
criminal activities in different months of the year. Figure [4] 
demonstrates some of these variations. As per police ob¬ 
servations, both violent and non-violent crime incidents are 
lower in the winter months (Dec.-Feb.). 
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Month 

= 2011-2012 ■ 2012-2013 ii 2013-2014 

Figure 4: Seasonality of crime. 

4. IDENTIFYING VIOLENT OFFENDERS 


In this section, we describe our problem, some of the exist¬ 
ing practical approaches used by law-enforcement, and our 
approach based on supervised learning with features primar¬ 
ily generated by the network topology. 

4.1 Problem Statement 

Given a co-offender network, G = (V, E) and for each 
historical timepoint t 6 t = {1, ...,t m ax} and v £ V, we 
have the values of arrj,, distr *, beaf v and elements of the 
sets V$,gancfc, we wish to identify set {v € V s.t. 3 1 > 
tmax where |V(,|> 0}. In other words, we wish to find a set 
of offenders in our current co-offender network that commit 
a violent crime in the future. 

4.2 Existing Methods 

Here we describe two common techniques often used by 
law-enforcement to predict violent offenders. The first is 
a simple heuristic based on violent activities in the past. 
The second is a heuristic that was based on the findings of 
[IT] which was designed to locate future victims of violent 
crime. Both of these approaches are ad-hoc practical ap¬ 
proaches that have become “best practices” for predicting 
violent offenders. However, we are not aware of any data- 
driven, formal evaluation of these methods in the literature. 

Past Violent Activities (PVA). The first ad-hoc ap¬ 
proach is quite simple: if an offender has committed a vi¬ 
olent crime in the past, we claim that he will commit a 
violent crime in the future. An obvious variant of this ap¬ 
proach is to return the set of violent offenders from the last 
At days. We note in practice, if police also have records of 
those who are incarcerated, and such individuals would be 
removed from the list (due to the different jurisdictions of 
police and corrections in the Chicago area, we did not have 
access to incarceration data - however discussed re-arrests 
observed in the data in the previous section). 

Two-Hop Heuristic (THH). The two-hop heuristic is 
based on the result of 17 which investigated a social net¬ 
work of gunshot victims in Boston and found an inverse re¬ 
lationship between the probability of being a gunshot victim 
and the shortest path distance on the network to the nearest 
previous gunshot victim. Hence, THH returns all neighbors 
one and two hops away from previous violent criminals (see 
Algorithm [T] for details on the version we used in our ex¬ 
periments - which was the best-performing variant for our 
data). The Chicago Police have adopted a variant of this 
method to identify potential gang victims using a combina¬ 
tion of arrest and victim data - the co-arrestee network of 
criminal gang members includes many individuals who are 
also victims of violent crime (this is a direct result of gang 
conflict). We note that victim information did not offer a 
significant improvement to our approach, except the trivial 
case that a homicide victim cannot commit any crime in the 
future. 

4.3 Supervised Learning Approach 

We evaluated many different supervised learning approaches 
including Naive Bayes (NB), Linear Regression (LR), Deci¬ 
sion Tree (DT), Random Forest (RF), Neural Network (NN), 
and support vector machines (SVM) on the same set of fea¬ 
tures for the nodes in the network that we shall describe in 
this section. We also explored combining these approaches 





















































































Algorithm 1 Two-Hop Heuristic 
1: procedure TwoHop(G) t> Offender network G. 

2: R 4— {} > Identified violent offenders. 

3: VICTIMS <-{«£ G\is-homicidejuictim(u)} 

4: for v £ VICTIMS do 

5: N <— Ny U Ny > Immediate neighbors 

6: R <— R U {u £ N s.t. 14 = 0} 

7: return R 


with techniques for imbalanced data such as SMOTE [4| and 
Borderline SMOTE [9j, however we do not report the results 
of Borderline SMOTE as it provided no significant difference 
from SMOTE. We group our features into four categories: 
(1.) neighborhood-based (having to do with the immedi¬ 
ate neighbors of a given node), (2.) network-based (features 
that require the consideration of more than a nodes imme¬ 
diate and nearby neighbors), (3.) temporal characteristics, 
and (4.) geographic characteristics. 

4.3.1 Neighborhood-Based Features 

Neighborhood-based features are the features computed 

using each node and its first and/or second level neighbors 
in G - often with respect to some CCV. The simplest such 
measure is the degree of vertex v - corresponding to the 
number of offenders arrested with v. We can easily extend 
this for some set of crimes of interest ( C ) where we look at 
all the neighbors of v who have committed a crime in C. 
This generalizes degree (as that is the case where C = 0). 
In our experiments, we found the most useful neighborhood 
features to be in the case where C = V though standard 
degree (G = 0) was also used. We also found that using 
combinations of the following booleans based on the below 
definition also proved to be useful: 

maj v (C, i) = |{u|u £ (UiA/)n Vb}|> 0.5 x |(UiA/)| 

Intuitively, maj v (C,i ) is true if at least half of the nodes 
within a network distance of i from node v have committed 
a crime in C and false otherwise. Using these intuitions, we 
explored the space of variants of these neighborhood-based 
features and list those we found to be best-performing in 
Table H 

4.3.2 Network-Based Features 

Network-based features fall into two sub-categories that 
we shall describe in this section: community-based and path- 
based. 

Network-based community features. We use several 
notions of a node’s community when engineering features: 
the connected component to which a node belongs, the gang 
to which a node belongs, and what we will refer to as an in¬ 
dividual’s group. The connected component is simply based 
on the overall network structure, while the gang is sim¬ 
ply the subgraph induced by the individuals in the net¬ 
work who belong to the same gang (the social network of 
node v’s gang is denoted G ga n 9 „. A nodes group is defined 
as the partition he/she belongs to based on a partition of 
Ggangy found using the Louvain algorithm 5 . We found 
in our previous work 19 and ensuing experience with the 
Chicago Police that the groups produced in this method 


Table 3: Neighborhood-Based Features 


Description 

Definition 

Degree (w.r.t. G) 

|{u|u £ NZ n Vc}\ 

Fraction of 1-hop 
neighbors com¬ 
mitting a crime 
in G 

|{u|«6lV, 1 nVb}|/|JV, 1 | 

Fraction of 2-hop 
neighbors com¬ 
mitting a crime 
in G 

|{u|u£ A4 2 nu c }|/|iV„ 2 | 

Majority of 1-hop 
and 2-hop neigh¬ 
bors committing 
a crime in G 

maj v (C, 1) A maj v (C , 2) 

Minority of 1-hop 
and majority of 
2-hop neighbors 
comitting a crime 
in G 

->maj v (C, 1) A maj v (C, 2) 


Table 4: Network-Based Features (Community) 


Description 

Definition 

Component 

size when v is 

removed 

|C (C„(G)\M)| 

Largest compo¬ 
nent size with a 
violent node after 
v is removed 

max„/ 6 c(c„(G){»}ny v \X V '\ 
where X '„/ = C„/(C t ,(G){t>}) 

Group size 

Pu(G 9 ong„)| 

Relationships 
within the group 

|{(«, v) £ E s.t. u,v £ 

Pv (Ggang v ) } | 

Number of vio¬ 
lent members in 
the group 

|{V £ P v (Ggang v ) S.t. V v ^ 0}| 

Triangles in 

group 

No. of triangles within sub¬ 
graph P„ (Ggang v ) 

Transitivity of 

group 

No. of triangles in Py(G g angy) 

No. of ‘V”’s in Py(Ggangy) 

Group-to-group 

connections 

{u £ P y{Ggangy) S.t. 3 (u, Ul) £ 

E where w P t ,(G sarl g o )} 

Gang-to-gang 

connections 

\{u £ G gang v S.t. 3 (u,w) £ E 
where w (£ G gangv }\ 


were highly relevant operationally. In this work, we also 
examined other community finding methods (i.e. Infomap, 
and Spectral Clustering) and found we obtained the best re¬ 
sults by using the Louvain algorithm. We provide our best 
performing network-based community features that we used 
in Table [4] Of particular interest, we found for individual 



























Table 5: Network-Based Features (Path) 


Description 

Definition 

Betweenness 
(w.r.t. C) 

(T v (u, w) 

^u,w£Vc <j{u,w) 

Closeness (w.r.t. 

C) 

(|Vc|— 1)/£„6Vo d(u,v) 

Shell Number 

shellc{v) (see appendix for fur- 

(w.r.t. C) 

ther details) 

Propagation 

1 if v £ F re (kv), 0 otherwise. 

(w.r.t. C) 

(see appendix for further de¬ 
tails) 


Table 6: Geographic Features 


Name 

Definition 

District Fre¬ 

quency 

|{(t, «') s.t. arr^i = true A 

3 1' s.t. distr^i = distrl }| 

Beat Frequency 

\{(t,v') s.t. arr\i = true A 

3 1' s.t. beat */ = 6 ea£* }| 

Beat Violence 

\{(t,v') s.t. arr *, = trueAV*/ 7 ^ 

0 A 3t' s.t. beat *, = beatl }| 

District Violence 

\{(t,v') s.t. arr*, = trueAV*/ 7 ^ 

0 A 3 1! s.t. distrl, = distr* v }| 


v that features relating to the size of the largest connected 
component resulting v' removal of his/her connected com¬ 
ponent was useful. Another interesting pair of features we 
noted for both group and gang were the number of edges 
from members of that group/gang to a different group or 
gang. We hypothesize that the utility of these features is a 
result of conflicts between groups/gangs they are connected 
to as well as the spread of violence amongst different groups 
(i.e. if two groups are closely connected, one may conduct 
violent activities on behalf of the other). 


Network-based path features. We looked at several fea¬ 
tures that leveraged the paths in the network by adopting 
three common node metrics from the literature: between¬ 
ness, closeness [ 6 ], and shell-number 23 as well as a prop¬ 
agation process based on a deterministic tipping model [ 8 ], 
The features are listed in Table [5] We examined our modi¬ 
fied definitions of closeness, betweenness, and shell number 
where C was a single element of V, where C = V and where 
C = 0 (which provides the standard definitions of these 
measures). Our intuition was that individuals nearer in the 
network to other violent individuals would also tend to be 
more violent - and we found several interesting relationships 
such as that for closeness (where C = Vv) discussed in sec¬ 
tion |5.1| when we run the classifier on each feature group. 
Shell number and the propagation process were used to cap¬ 
ture the idea of the spread of violence (as shell number was 
previously shown to correspond with “spreaders” in various 
network epidemic models |ll]). For the propagation process, 
we set the threshold (k) equal to two, three, four, five, and 
six. Further details on shell number and the propogation 
process can be found in the appendix. 


4.3.3 Geographic Features 

Geographic features capture the information related to 
the location of a crime incident. The intuition is that the 
individuals who commit crimes in violent districts are more 
likely to become violent than the others. We found that the 
beat the individual has committed a crime in is an impor¬ 
tant feature for our problem. This is in accordance with 
previous well known literature in criminology [3j [21 which 
studies spatio-temporal modeling of criminal behavior. The 
complete list is shown in Table [ 6 ] 

4.3.4 Temporal Features 

We considered couple of temporal features: average inter¬ 


val month and number of violent groups. Average interval 
time considers the average time duration of consecutive ar¬ 
rests of the offender. The other feature, which we examine, 
is number of violent groups appeared over time in the envi¬ 
ronment. We examined that the number of violent groups 
has been an important temporal aspect for identifying the 
violent criminals. The key intuition here is, if at least one 
member of the offender’s groups (formed over time) is vio¬ 
lent then we consider the offender as a part of that violent 
group. For an individual v, we define the partially ordered 
set tc = {t s.t. arr\, = true A Vc 7 ^ 0} (intuitively the set 
of the time points where v has committed at least on of the 
crimes in C.) We also define A/(C) = t” — £/_! for each 
ti £ £(/. Considering these definitions, we formally define 
the temporal features in Table [7] 


Table 7: Temporal Features 


Name 

Definition 

Average interval 
time (w.r.t. C) 

Ei K(C)/\t v c \ 

Number of vio¬ 
lent groups 

|{f s.t. arr* v = true A 

3?/ s.t. arr */ = true A 

V* 7^0 A 
v’ € IV*}| 


5. EXPERIMENTAL RESULTS 

In this section, we review the results of our experiments. 
We looked at two types: experiments where the entire co¬ 
offender network is known before-hand (Section |5.1| l and ex¬ 
periments where the network is discovered over time (Sec¬ 
tion 5.2 1 . The intuition behind the experiments where the 


co-offender network is known is that the police often have 
additional information to augment co-arrestee data. This 
information can include informant reporting, observed indi¬ 
viduals interacting by patrolmen, intelligence reporting, and 
information discovered on social media and the Internet. In 
our second type of experiment we discover the network over 
time in an effort to mimic real-world operations - however, 
we also show that this makes the problem more difficult as 
it reduces the power of neighborhood-based and network- 
based features. Based on our discussions with the Chicago 
























Police, we believe that real-world results will most likely fall 
somewhere between these two experiments. Operationally, 
we will not have full arrest data, but the aforementioned 
augmenting data sources are available (even though we did 
not have access to them for our experiments). 

5.1 Known Co-Offender Network 

In this experiment we assume that the entire offender net¬ 
work is known. In other words, to compute the features for 
each vertex v, we assume that the set V v is unknown while 
the rest of the network is observable. In here we compared 
our approach with THH but not with the PVA as we do not 
utilize time. In each of the experiments described in this sec¬ 
tion, we conduct 10-fold cross validation. We consider the 
result of each approach as a set of nodes that the approach 
considers to be a set of potentially violent individuals. Our 
primary metrics are precision (fraction of reported violent 
individual who were actually violent in the dataset), recall 
(fraction of violent individuals in the dataset reported by the 
approach), FI (the harmonic mean of precision and recall) 
and area under the curve. We conduct two types of experi¬ 
ments: first, we study classification performance using only 
features within a given category (neighborhood, network, 
temporal, and geographic), then we study the classification 
performance when the entire feature set is used but with 
various different classification algorithms and compare the 
result to THH. 

Classification using single feature categories. Here 
we describe classification results using single feature cate¬ 
gories. In this set of experiments, we use a random forest 
classifier (which we will later show provides the best perfor¬ 
mance of the classifiers that we examined). Figure [ 5 ] shows 
the performance of RF for the described categories. The 
network-based features are highly-correlated to violent be¬ 
havior with average FI value of 0.72 compared to 0.63 for 
neighborhood, 0.21 for geographic, and 0.03 for temporal 
features. In Figure [6] we show the performance of a feature 
from each category to classify violent vs. non violent crimes; 
the performance of each example is a good indicator of the 
performance of its category. 



(c) (d) 


Figure 6: Example features from each category, (a) 
Neighborhood-based: Minority of 1-hop and major¬ 
ity of 2-hop neighbors committing a crime in C. 
(b) Network-based: Closeness (w.r.t. V). (c) Ge¬ 

ographic: Beat violence, (d) Temporal: Average 
interval months. 
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Figure 5: Precision, recall, and FI comparison between 
each group of features. 

Classification comparison. Table [8] shows the perfor¬ 
mance of different classification algorithms. According to 


Figure 7: ROC curve for each feature set. 


Table [8] RF provides the best performance (Fl=0.83); we 
also note that using SMOTE for RF, did not improve this 
result. Figure[8]shows that our algorithm outperforms THH. 
The performance of our features are also illustrated in Fig¬ 
ure [7] The area under the curve (AUC) of applying all 
features is 0.98 - a higher overall accuracy. The AUC for 
network-based, neighborhood-based, geographic, and tem¬ 
poral categories are 0.92, 0.91, 0.65, and 0.7 respectively. 
This indicates the importance of network features for this 
classification task. 


5.2 Co-Offender Network Emerges Over Time 

In this section, we present a more difficult experiment - 
where the co-arrestee network is discovered over time (by 
virtue of arrests). To simulate this phenomenon, we split 
our data into two disjoint sets: the first set for learning 
and identification, and the second one for measuring the 










































































Table 8 : K-fold cross validation. 


Method 

Precision 

Recall 

FI 

RF 

0.89 

0.78 

0.83 

RF w. SMOTE 

0.86 

0.78 

0.82 

NB 

0.45 

0.49 

0.47 

LR 

0.68 

0.49 

0.57 

DT 

0.71 

0.66 

0.68 

NN 

0.64 

0.57 

0.6 

SVM 

0.73 

0.2 

0.31 



Figure 8: Performance comparison between THH and RF 
in K-fold cross validation. 


performance. We do monthly split and start from February 
2013. To illustrate the difficulty of this test, we show the 
number of nodes, edges, and violent individuals per month in 
Figure [9] We note that in the early months, we are missing 
much of the graphical data (over 40% of nodes and edges in 
the first two months) - hence making many of our features 
less effective. However, as the months progress, there are less 
violent individuals to identify (due to the temporal nature of 
the dataset) - hence amplifying the data imbalance as time 
progresses. 



••■Edges Nodes —Violent individuals 


Figure 9: Number of nodes, edges, and violent individuals 
over time. More training data, less offenders to identify. 

In these experiments, we compared our approach using 
random forests with the full feature set to THH and PVA. 
We measure precision, recall, FI, number of true positives, 


and number of false positives and display the results in Fig¬ 
ures 10 and 11 In FRF (Filtered Random Forest) we fil¬ 
ter the offenders who have not committed any crime in the 
last 200 days. This simple heuristic increase the precision 
drastically while preserving the recall. The main advantage 
of our method, besides the high precision, is its ability to 
significantly reduce the population of potentially violent of¬ 
fenders when compared to PVA - which for each month had 
between 1813 and 3571 false positives. Figure [TT| compares 
the number of true and false positives instances for all the 
approaches for each month except PVA (PVA was omitted 
due to readability because of the large amount of false posi¬ 
tives). While the FI measure for PVA is higher than that of 
the others, the large number of false positives prevents the 
law enforcement from using it effectively in practice. Fur¬ 
thermore, as time progresses, PVA likely rises in recall due 
to the drop in the number of violent criminals to predict. 


6. RELATED WORK 

Though we believe that the prediction of violent offenders 
using co-offender social networks is new, there has previously 
been work on both co-offender networks in general as well 
as crime forecasting. In this section, we briefly review some 
of the relevant contributions in both of these areas. 

There has been much previous work on co-offender net¬ 
works. The earlier work that studied these special social 
networks primarily came from the criminology literature. 
For instance, 14] utilizes social network analysis techniques 
to study several case studies where the social network of the 
criminal organization was known. In [13], the authors study 
the stability of these networks change over time. More re¬ 
cently graphical features derived from networks comprised 
of both offenders and victims has been shown to be related 
to the the probability of an individual becoming a victim 
of a violent crime }17[ 16 . Previous work has also looked 
at the relationship between network structure and geogra¬ 
phy 18 and has leveraged both network and geographic 


features to predict criminal relationships [25] as well as in¬ 
fluence gang members to dis-enroll 24]. There have also 
been several software tools developed for conducting a wide- 
range of analysis on co-offender networks including Crime- 
Fighter [20], CrimeLink [22], and ORCA [19]. However, our 
work departs from this is that we are looking to leverage 
the network topology and other features to identify violent 
offenders - which was not studied in any of the previous 
work. 

There has also been a large amount of work on crime fore¬ 
casting (i.e. 7, 12]) though historically, this work has relied 
on spatio-temporal modeling of criminal behavior HU21 or 


was designed to identify suspects for specific crimes 26 15 


None of this previous work was designed to identify future 
violent offenders nor did it leverage social network structure. 


7. CONCLUSION 

In this paper we explored the problem of identifying re¬ 
peat offenders who will commit violent crime. We showed 
a strong relationship between network-based features and 
whether a criminal will commit a violent offense providing 
an unbiased FI score of 0.83 in our cross-validation exper¬ 
iment where we assumed that the underlying network was 
known. When we moved to the case where the network 
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Figure 10: Precision, recall, and FI over time. 


was discovered over time, our method significantly outper¬ 
formed baseline approaches significantly increasing precision 
and recall. We are currently discussing ways to operational¬ 
ize this technology with the Chicago Police as well as design 
strategies to best deploy police assets to areas with higher 
concentrations of potentially violent offenders. We are also 
working with the police to identify other sources of data to 
build a more complete social network of the offenders. 

Appendix 

Shell Number. For a given graph, the fc-core is the largest 
subgraph where each node has at least degree k. The fc-shell 
is the set of nodes in core k but not in any higher core. A 
node’s shell number is k value of the shell to which that node 


Figure 11: Number of true and false positive instances. 


belongs. For a given node v and C C V, we define shellc(v) 
as the shell number of node v on the subgraph consisting 
of v and all nodes v' where C fl V v ^ 0. We slightly abuse 
notation and define shellq, (v) as the shell number of v on the 
full network. 


Propogation Process. For a given node v and the set of 
activated nodes V ', we define v’s active neighbors as follows: 

act v (V') = {u|u G Nl fi V'} 

We now define an activation function A that, given an initial 
set of active nodes, returns a set of active nodes after one 
time step. 

A k (V') = V'u{v£V s.t. \act v (y')\> k} 


We also note that the activation function can be applied 
iteratively, to model a diffusion process. Hence, we shall use 
the following notation to signify multiple applications of A 
(for natural numbers t > 1). 


Aiiv') 


A k (V') if t = 1 

A K (A(r 1 (F')) otherwise 


Clearly, when Aq ^CV') = A^^V') the process has con¬ 
verged. Further, this always converges in no more than \V\ 
steps, since the process must activate at least one new node 
in each step prior to converging. Based on this idea, we 
define the function T which returns the set of all nodes acti¬ 
vated upon the convergence of the activation function. We 






















































define r K (V r/ ) = A K (V') where t is the least value such that 
A t K (V') = A t ~ 1 (V'). 
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